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Abstract 

In  Chinese,  nouns  need  numeral  clas¬ 
sifiers  to  express  quantity.  In  this  pa¬ 
per,  we  explore  the  relationship  be¬ 
tween  classifiers  and  nouns.  We  ex¬ 
tract  a  set  of  lexical,  syntactic  and  onto¬ 
logical  features  and  the  corresponding 
noun-classifier  pairs  from  a  corpus  and 
then  train  SVMs  to  assign  classifers  to 
nouns.  We  analyse  which  features  arc 
most  important  for  this  task. 

1  Introduction 

In  English,  numbers  directly  modify  count  nouns, 
as  in  ‘two  apples’  and  ‘five  computers’.  Num¬ 
bers  cannot  directly  modify  mass  nouns;  instead, 
an  embedded  noun  phrase  must  be  formed,  e.g. 
‘five  slices  of  bread’.  However,  in  Chinese  ah 
nouns  need  numeral  classifiers  to  express  quan¬ 
tity1.  When  translating  from  English  to  Chinese, 
we  may  need  to  choose  Chinese  classifiers  to  form 
noun  phrases.  We  can  see  the  difference  between 
the  two  languages  in  the  following  two  examples: 
Wi[liang]  U/ge/  ^-^kfpingguo]  ( Chinese ) 
two  apples  ( English ) 

and 

H[wu]  )t/pian/  mianbao ]  ( Chinese ) 
five  slices  of  bread  (English) 

Noun  classifer  combinations  appeal-  with  high 
frequency  in  Chinese.  There  are  more  than  500 
classifiers  although  fewer  than  200  of  them  are 
frequently  used.  Each  classifier  can  only  be 

‘Proper  nouns  and  bare  noun  phrases  do  not  need  classi¬ 
fiers. 


used  with  certain  classes  of  noun.  Nouns  in  a 
class  usually  have  similar  properties.  For  exam¬ 
ple,  nouns  that  can  be  used  with  the  classifier 
‘til  [gen]’  are:  (straw),  ‘^-p’(chopstick), 

‘IH^P’tpipe),  etc.  Ah  these  objects  are  long  and 
thin.  However,  sometimes  nouns  with  similar 
properties  are  in  different  classes.  For  example, 
‘^’(cow),  ‘  A; ’(horse)  and  (lamb)  are  all  live¬ 

stock,  but  they  associate  with  different  classifiers. 
This  means  that  classifier  assignment  is  not  totally 
rule -based  but  partly  idiomatic. 

In  this  paper,  we  explore  the  relationship  be¬ 
tween  classifiers  and  nouns.  We  extract  a  set  of 
features  and  the  corresponding  noun-classifier  at¬ 
tachments  from  a  corpus  and  then  train  SVMs  to 
assign  classifers  to  nouns.  In  Section  4  we  de¬ 
scribe  our  data  set.  In  Section  5  we  describe  our 
experiments.  In  Section  6  we  present  our  results. 

2  Related  Work 

Many  Asian  languages  (e.g.  Chinese,  Korean, 
Japanese  and  Thai)  have  numeral  classifier  sys¬ 
tems.  Previous  work  on  noun-classifier  match¬ 
ing  has  been  done  in  these  languages.  (Sorn- 
lertlamvanich  et  al.,  1994)  present  an  algorithm 
for  selecting  an  appropriate  classifier  for  a  noun 
in  Thai.  The  general  idea  is  to  extract  noun¬ 
classifier  collocations  from  a  corpus,  and  output  a 
list  of  noun-classifier  pairs  with  frequency  infor¬ 
mation.  During  noun  phrase  generation,  the  most 
frequently  co-occurring  classifier  for  a  given  noun 
is  selected.  However,  no  evaluation  is  reported  for 
this  algorithm. 

The  algorithm  described  in  (Paik  and  Bond, 
2001)  generates  Japanese  and  Korean  numeral 
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classifiers  using  semantic  classes  from  an  ontol¬ 
ogy.  The  authors  assigned  classifiers  to  each 
of  the  2,710  semantic  classes  in  the  ontology 
by  hand.  During  generation,  nouns  in  each  se¬ 
mantic  class  arc  assigned  the  associated  classi¬ 
fier.  The  classifier  assignment  accuracy  is  81% 
for  Japanese  classifiers  and  62%  for  Korean  clas¬ 
sifiers.  However,  the  evaluation  set  contains  only 
90  noun  phrases,  which  is  pretty  small.  Further¬ 
more,  it  is  hai'd  work  to  attach  classifiers  to  an  on¬ 
tology  by  hand,  and  with  this  approach  it  is  hai'd 
to  deal  with  cases  like  the  cattle  example  men¬ 
tioned  earlier. 

(Paul  et  ah,  2002)  present  a  method  for  ex¬ 
tracting  classifier  information  from  a  bilingual 
(Japanese-English)  corpus  based  on  phrasal  cor¬ 
respondences  in  the  sentential  context.  Bilin¬ 
gual  sentence  pairs  are  compared  to  find  noun¬ 
classifier  collocations.  The  evaluation  was  done 
by  a  human.  The  precision  is  high  (84.2%)  but 
the  recall  is  only  about  40%  because  the  algorithm 
does  not  give  output  for  half  of  the  nouns. 

In  contrast  to  these  algorithms,  our  approach:  is 
based  on  a  large  data  set;  uses  machine  learning; 
and  does  not  require  the  attachment  of  classifiers 
to  an  ontology  by  hand. 

3  Support  Vector  Machines 

Support  Vector  Machines  (SVMs)  are  a  type  of 
classifier  first  introduced  in  (Boser  et  ah,  1992). 
In  the  last  few  years  SVMs  have  become  an  im¬ 
portant  and  active  field  in  machine  learning  re¬ 
search.  The  SVM  algorithm  detects  and  exploits 
complex  patterns  in  data. 

A  binary  SVM  is  a  maximum  margin  classifier. 
Given  a  set  of  training  data  {x±,x2,  ■■■,  Xk},  with 
corresponding  labels  yi,  y2, ...,  yk  G  {+1,  -1},  a 
binary  SVM  divides  the  input  space  into  two  re¬ 
gions  at  a  decision  boundary,  which  is  a  separat¬ 
ing  hyperplane  (w,x)  +  b  =  0  (Figure  1).  The 
decision  boundary  should  classify  all  points  cor¬ 
rectly,  that  is: 

Vi({w,  Xi)  +  b)>  0,  Vi 

Also,  the  decision  boundary  should  have  the 
maximum  separating  margin  with  respect  to  the 
two  classes.  If  we  rescale  w  and  b  to  make 
the  closest  point(s)  to  the  hyperplane  satisfy 


Figure  1 :  The  input  space  and  hyperplane 


| (w,  Xi)  +  b\  =  1,  then  the  margin  equals  1/|| w 
and  the  problem  can  be  formulated  as: 


minimize 


1 

2 


U! 


2 


subject  to  yi((w,  Xi)  +  b)  >  1,  Vi 

The  generalized  Lagrange  Function  is: 


L(w,  b,  a)  =  -( w,w)-^2ai[yi({w,Xi)+b)-l ] 

i=  t 


So  we  can  transform  the  problem  to  its  dual: 
maximize 

n  2  n 

W(a)  =  J2ai-  -  aiajyiyj{xi,Xj) 

i—  1  i=l,j=l 


n 

subject  to  oti  >  0,  ^2  otiyi  =  0 
i=  1 

This  is  a  quadratic  programming  (QP)  problem 
and  we  can  always  find  the  global  maximum  of 
CKj.  We  can  recover  w  and  b  for  the  hyperplane 
by: 

n 

W  =  OiiyiXi 
i=  1 

b  _  maxgi=_i((w,  Xi))  +  miny.=+i((w,  Xj)) 

2 

If  the  points  in  the  input  space  are  not  linearly 
separable,  we  allow  ‘slack  variables’  £*  in  the 
classification.  We  need  to  find  a  soft  margin  hy¬ 
perplane,  e.g.: 

1  n 

minimize  -||m||2  +  C  ^ 

2  i= t 
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subject  to  yi((w,Xi)  +  b)  >  1  —  Vi 

Once  again,  a  QP  solver  can  be  used  to  find  the 
solution. 

For  our  task  we  need  multi-class  SVMs.  To  get 
multi-class  SVMs,  we  can  construct  and  combine 
several  binary  SVMs  (one-against-one),  or  we  can 
directly  consider  all  data  in  one  optimization  for¬ 
mula  (one-against-all). 

Many  SVM  implementations  arc  available  on 
the  web.  We  chose  LIB  SVM  (Chang  and  Lin, 

2001) ,  which  is  an  efficient  multi-class  imple¬ 
mentation.  LIBSVM  uses  the  “one-against-one” 
approach  in  which  k(k  —  l)/2  classifiers  arc  con¬ 
structed  and  each  one  trains  on  data  from  two  dif¬ 
ferent  classes  (Hsu  and  Lin,  2002). 

4  Data  and  Resources 

We  use  the  Penn  Chinese  Treebank  (Xue  et  al., 

2002)  as  our  corpus  and  the  ontology/lexicon 
HowNet  (Dong  and  Dong,  2000)  to  get  ontologi¬ 
cal  features  for  nouns.  We  train  SVMs  on  differ¬ 
ent  feature  sets  to  see  which  set(s)  of  features  arc 
important  for  noun-classifier  matching. 

4.1  Penn  Chinese  Treebank 

The  Penn  Chinese  Treebank  is  a  500,000  word 
Chinese  corpus  annotated  with  both  part-of- 
speech  (POS)  tags  and  syntactic  brackets. 

We  automatically  extract  noun  phrases  that 
contain  classifiers  from  the  corpus.  An  example 
noun  phrase  (translation:  ‘a  major  commercial 
waterway’)  is: 

(IP 

(NP  (QP  (CD  -■)  (CLP  (M  &))) 

(NP  (NN  7jCfe))  (ADJP  (JJ  X)) 

(NP  (NN  »))) 

) 

The  word  in  (CLP  (M  ^[tiao]))  is  the  classifier 
and  the  head  noun  of  the  noun  phrase  is  (NN  sij 
lA).  In  Section  5.3  we  describe  a  set  of  features 
we  obtain  from  each  noun  phrase  and  the  sentence 
in  which  it  is  embedded. 

In  our  corpus,  there  are  61587  noun  occur¬ 
rences  (12225  unique  nouns)  and  3940  classifier- 
noun  co-occurrences  (212  unique  classifiers). 
However,  there  is  a  trival  rule  determining 


whether  a  noun  needs  a  classifier.  If  a  noun  is 
preceded  by  a  quantifier  or  a  determiner,  then  a 
classifier  is  needed,  otherwise  it  is  not.  Hence, 
we  only  focus  on  noun-classifier  pairs.  The  most 
frequently  occurring  classifier  in  this  corpus  is 
‘A[ge]’,  which  occurs  with  497  unique  nouns.  In 
this  corpus,  87  classifiers  occur  in  only  one  noun¬ 
classifier  pair. 

4.2  HowNet 

We  get  ontological  features  of  nouns  from 
HowNet.  HowNet  is  a  bilingual  Chinese-English 
lexicon  and  ontology.  Each  word  sense  is  as¬ 
signed  to  a  concept  containing  ontological  fea¬ 
tures.  HowNet  uses  basic  meaning  units  named 
sememes  to  construct  concepts. 

Table  1  shows  an  example  entry  in  HowNet. 
The  entry  in  Table  1  is  for  the  word  ‘fb 
^’(writer).  The  sememe  at  the  first  position,  ‘hu- 
man(A)’,  is  the  categorical  attribute,  which  de¬ 
scribes  the  general  category  of  the  concept.  The 
sememes  following  the  first  sememe  are  addi¬ 
tional  attributes,  which  give  additional  specific 
features.  There  are  two  types  of  pointer,  ‘#’  and 
in  the  definition.  ‘#’  means  ‘related’,  so 
‘#occupation’  shows  that  the  concept  has  a  re¬ 
lationship  with  ‘occupation’.  **’  means  ‘agent’, 
so  ‘*compile’  shows  that  ‘writer’  is  the  agent  of 
‘compile’.  The  sememes  ‘ffreadings’  and  ‘litera¬ 
ture’  show  that  the  job  of  ‘writer’  is  to  compile 
‘readings’  about  ‘literature’. 

We  use  HowNet  2000,  which  contains  120,496 
entries  for  about  65,000  Chinese  words  defined 
with  a  set  of  1503  sememes.  It  is  big  enough  for 
our  task  and  we  can  get  ontological  features  for 
94.71%  of  the  nouns  from  the  Penn  Chinese  Tree- 
bank.  For  the  nouns  that  are  not  in  HowNet,  we 
just  leave  the  ontological  features  blank. 

5  Experiments 

We  use  six  different  feature  sets  to  assign  classi¬ 
fiers  to  nouns.  To  evaluate  each  feature  set,  we 
perform  10-fold  cross  validation.  We  report  our 
results  in  Section  6. 

5.1  Baseline  Algorithm 

In  the  training  data,  we  count  the  number  of  times 
each  classifier  appeal's  with  a  given  noun.  We  as¬ 
sign  to  each  noun  in  the  testing  data  its  most  fre- 
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No.:  114303 

W_C  (word  in  Chinese):  ifM 
E_C  (example  in  Chinese): 

G_C  (POS  tag  in  Chinese):  N 
W_E  (word  in  English):  writer 
E_E  (example  in  English): 

G_E  (POS  tag  in  English):  N 

DEF  (concept  definition):  human(A),#occupation(|Rfyi),*compile(|®lt), 
#readings(i^f^()diterature(A) 

Table  1 :  An  entry  in  HowNet 


Lexical  Features 

Syntactic  Features 

noun 

POS  of  noun 

first  premod 

POS  of  first  premod 

second  premod 

POS  of  second  premod 

main  verb 

POS  of  main  verb 

total  number  of  premodifiers 

sentType 

embedded  in  vp  or  pp 

quoted  or  not 

Table  2:  Features  extracted  from  training  data 


quently  co-occurring  classifier  (c.f.  (Sornlertlam- 
vanich  et  ah,  1994)).  If  a  noun  does  not  appeal-  in 
the  training  data,  we  assign  the  classifier  ‘  A[ge]’, 
the  classifier  which  appeal's  most  frequently  over¬ 
all  in  the  corpus. 

5.2  Noun  Features 

Since  classifiers  are  assigned  mostly  based  on  the 
noun,  the  most  important  features  for  classifier 
prediction  should  be  features  of  the  nouns.  We 
ran  four  different  experiments  for  noun  features: 

•  (1)  The  feature  set  includes  only  the  noun  it¬ 
self. 

•  (2)  The  feature  set  includes  ontological  fea¬ 
tures  of  the  noun  only.  If  classifiers  arc  as¬ 
sociated  with  semantic  categories  (c.f.  (Paik 
and  Bond,  2001)),  we  should  be  able  to  as¬ 
sign  classifiers  based  on  the  ontological  fea¬ 
tures  of  nouns. 

•  (3)  The  feature  set  includes  the  noun  and  on¬ 
tological  features. 

•  (4)  Two  SVMs  are  trained:  one  on  the  noun 
only,  and  one  on  ontological  features  only. 
During  testing,  nouns  in  the  training  set 


are  assigned  classifiers  using  the  first  SVM; 
other  nouns  are  assigned  classifiers  using  the 
second  SVM. 

5.3  Context  Features 

In  this  set  of  experiments,  we  used  features  from 
both  the  noun  and  the  context.  The  features  we 
used  can  be  categorized  into  two  groups:  lexical 
features  and  syntactic  features.  They  are  shown 
in  Table  2. 

We  ran  two  experiments  using  this  set  of  fea¬ 
tures: 

•  (5)  The  feature  set  includes  the  noun,  lexical 
and  syntactic  features  only. 

•  (6)  The  feature  set  includes  the  noun,  lexical, 
syntactic  and  ontological  features. 

6  Results  and  Discussion 

We  built  SVMs  using  all  the  feature  sets  described 
in  Section  5  and  tested  using  10-fold  cross  valida¬ 
tion.  We  tried  the  four  types  of  kernel  function  in 
LIBSVM:  lineal-,  polynomial,  radial  basis  func¬ 
tion  (RBF)  and  sigmoid,  then  selected  the  RBF 
kernal  K(x,y)  =  e-7^-2^  ,  which  gives  the 
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Algorithm 

All  nouns 

Nouns  occuring  2+  times 

Baseline 

50.76% 

50.69% 

(1)  noun  only 

57.81%  (c  =  4,  7  =  0.5) 

59.34%  (c  =  16,  7  =  0.125) 

(2)  ontology  only 

58.69%  (c  =  4,  7  =  0.5) 

60.68%  (c  =  256,  7  =  0.125) 

(3)  noun  and  ontology 

57.81%  (c  =  16,  7  =  0.5) 

59.46%  (c  =  16,  7  =  0.125) 

(4)  noun  or  ontology 

58.71% 

60.55% 

(5)  noun,  syntactic  and 
lexical  features 

52.14%  (c  =  1024,  7  =  0.5) 

53.51%  (c  =  16,  7  =  0.5) 

(6)  all  features 

52.06%  (c  =  1024,  7  =  0.075) 

53.55%  (c  =  16,  7  =  0.5) 

Table  3:  Accuracy  of  different  algorithms 


Most  common 

noun 

1$[wei] 

Alcil 

Tlge] 

A  [ming] 

JSljie] 

Eft  [xiang] 

fit[wei] 

WM  (official) 

24.1  (57.1) 

14.7  (34.7) 

T^IcTj 

A  A  (conven¬ 
tion) 

22.3  (53.3) 

1.1  (2.6) 

7.6  (18.2) 

i'lge] 

H  (project) 

1.0  (7.0) 

0.7  (5.2) 

0.2  (1.7) 

3.3  (24.4) 

A  [ming] 

AM  (person) 

31.7  (55.2) 

23.8  (41.4) 

JSLjie] 

iSSSA  (sports 
tournament) 

1.9  (2.1) 

29.6  (34.0) 

31.5  (36.2) 

IJH  [xiang] 

fKA  (achieve¬ 
ment) 

6.6(11.3) 

35.2  (60.4) 

1.1  (1.9) 

Table  4:  Most  commonly  misclassified  classifiers;  Cell  shows  percentage  of  total  occurrences  of  row 
value  misclassified  as  column  value  and  (percentage  of  total  misclassifications  of  row  value  misclassi¬ 
fied  as  column  value) 


highest  accuracy.  For  each  feature  set,  we  sys¬ 
tematically  varied  the  values  for  the  parameters  C 
(range  from  2-5  to  215)  and  7  (range  from  23  to 
2  15);  we  report  the  best  results  with  correspond¬ 
ing  values  for  C  and  7.  Finally,  for  each  feature 
set,  we  ran  once  on  all  nouns  and  once  only  on 
nouns  occurring  twice  or  more  in  the  corpus. 

Classifier  assignment  accuracy  is  reported  in 
Table  3.  The  performance  of  all  the  SVMs  is  sig¬ 
nificantly  better  than  baseline  (paired  t-test,  p  < 
0.005).  There  is  no  significant  difference  between 
the  performance  with  the  1st,  2nd,  3rd  and  4th 
feature  sets.  But  the  performance  of  the  SVMs  us¬ 
ing  lexical  and  syntactic  features  (experiments  5 
and  6)  is  significantly  worse  than  the  performance 
on  feature  sets  1-4  (df  =  17.426,  p  <  0.05). 

These  results  show  that  lexical  and  syntactic 
contextual  features  do  not  have  a  positive  effect 
on  the  assignment  of  classifiers.  They  confirm  the 
intuition  that  the  noun  is  the  single  most  important 
predictor  of  the  classifier;  however,  the  semantic 
class  of  the  noun  works  as  well  as  the  noun  itself. 
In  addition,  a  combination  approach  that  uses  se¬ 


mantic  class  information  when  the  noun  is  previ¬ 
ously  unseen  does  not  perform  better. 

We  also  computed  the  confusion  matrix  for  the 
most  commonly  misclassified  classifiers.  The  re¬ 
sults  are  reported  in  Table  4. 

For  these  experiments  we  used  automatic  eval¬ 
uation  (cf.  (Paul  et  al.,  2002)).  A  classifier  is  only 
judged  to  be  correct  if  it  is  exactly  the  same  as  that 
in  the  original  test  set.  For  some  noun  phrases, 
there  arc  multiple  valid  classifiers.  For  example, 
we  can  say 

‘ — ‘[yi]  #£[kuai]  ^WUinpai]’ 
or 

‘ — ‘[yi]  $C[mei]  vfcW[jinpai|’ 

(a  golden  medal). 

We  did  a  subjective  evaluation  on  paid  of  our 
data  to  evaluate  how  many  automatically  gener¬ 
ated  classifiers  arc  acceptable  to  human  readers. 
We  randomly  selected  241  noun-classifier  pairs 
from  our  data.  We  presented  the  sentence  con¬ 
taining  each  pair  to  a  human  judge  who  is  a  na¬ 
tive  speaker  of  Mandarin  Chinese.  We  asked  the 
judge  to  rate  all  the  classifiers  generated  by  our 
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Algorithm 

Number  rated  1 
or  higher 

Percent  rated  1  or 
higher 

Average  rating 

Baseline 

209 

86.7% 

1.59 

(1)  noun  only 

224 

92.9% 

1.76 

(2)  ontology  only 

226 

93.8% 

1.78 

(3)  noun  and  ontology 

226 

93.8% 

1.77 

(4)  noun  or  ontology 

227 

94.2% 

1.80 

(5)  noun,  syntactic  and 
lexical  features 

218 

90.5% 

1.67 

(6)  all  features 

218 

90.5% 

1.67 

Original 

241 

100% 

1.95 

Table  5:  Human  evaluation:  Ratings  of  classifiers 


algorithms  as  well  as  the  original  classifier  by  in¬ 
dicating  whether  each  is  good  (2),  acceptable  (1) 
or  bad  (0)  in  that  sentence  context.  The  classifiers 
were  presented  in  random  order;  the  judge  was 
blind  to  the  source  of  the  classifiers. 

The  results  for  our  human  evaluation  arc  re¬ 
ported  in  Table  5.  Although  our  automatic  eval¬ 
uation  indicates  relatively  poor  accuracy,  94.2% 
of  generated  classifiers  using  feature  set  4)  arc 
rated  acceptable  or  good  in  our  subjective  evalua¬ 
tion.  Also,  the  performance  of  SVMs  with  the  1st, 
2nd,  3rd  and  4th  feature  sets  is  significantly  bet¬ 
ter  than  baseline  (paired  t-test,  p  <  0.005).  There 
is  no  significant  difference  between  the  perfor¬ 
mance  with  the  1st,  2nd,  3rd  and  4th  feature  sets. 
But  the  performance  of  the  SVMs  using  lexical 
and  syntactic  features  (experiments  5  and  6)  is 
significantly  worse  than  those  without  (p  <  0.05). 
The  ratings  of  the  classifiers  generated  by  all  our 
algorithms  arc  significantly  worse  than  the  origi¬ 
nal  classifiers  in  the  corpus.  In  future  work,  we 
plan  to  extend  this  evaluation  using  more  judges. 

Which  classifier  to  select  also  depends  on  the 
emotional  background  of  the  discourse  (Fang, 
2003).  For  example,  we  can  use  different  class- 
tiers  to  express  different  affect  for  the  same  noun 
(e.g.  if  a  government  official  is  in  favor  or  dis¬ 
grace).  However,  we  cannot  get  this  kind  of  in¬ 
formation  from  our  corpus. 

7  Conclusions  and  Future  Work 

Our  machine  learning  approach  to  classifier  as¬ 
signment  in  Chinese  performs  better  than  previ¬ 
ously  published  rule-based  approaches  and  works 


for  bigger  data  sets.  The  noun  is  clearly  the  most 
important  feature  (experiment  1).  However,  we 
still  think  ontological  features  may  be  useful  in 
classifier  assignment,  for  example  for  previously 
unseen  nouns,  and  our  experimental  results  show 
a  trend  in  this  direction,  although  not  a  statisti¬ 
cally  significant  one  (experiments  2  and  4). 

We  used  the  Chinese  Treebank  for  these  ex¬ 
periments  because  it  is  the  only  available  corpus 
of  parsed  Chinese  text.  Now  that  we  have  iso¬ 
lated  the  relevant  features  for  this  task,  we  plan  to 
conduct  further  experiments  using  larger  corpora, 
such  as  the  Chinese  Gigaword  (Graf  and  Chen, 
2003). 

Our  use  of  ontological  features  could  be  im¬ 
proved  in  several  ways.  First,  the  ontological  fea¬ 
tures  we  get  from  HowNet  do  not  fit  our  pur¬ 
pose  well.  For  example,  the  definitions  of  ‘$5’ 
(cat)  and  ‘41’  (cow)  arc  both  ‘livestock’;  how¬ 
ever,  they  should  use  different  classifiers.  In  or¬ 
der  to  improve  the  performance  of  our  approach, 
we  need  an  ontology  that  correctly  groups  nouns 
into  classes  according  to  their  semantic  properties 
(e.g.  type,  shape,  color,  size). 

For  another  knowledge -rich  approach,  we 
could  use  a  complex  ontology  plus  a  Chinese 
classifier  dictionary  that  describes  the  properties 
of  the  objects  each  classifier  can  modify.  By 
comparing  noun  properties  and  classifier  char¬ 
acteristics,  classifier  assignment  could  be  im¬ 
proved  as  long  as  the  nouns  are  in  the  ontol¬ 
ogy.  However,  there  are  many  idiomatic  noun¬ 
classifier  matchings  that  can  not  be  categorised 
by  dictionaries.  Therefore,  a  combination  of  rule- 
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based  and  machine-learning  approaches  seems 
most  promising. 

Third,  we  can  classify  Chinese  classifers  into 
groups  and  focus  on  those  that  modify  single  ob¬ 
jects.  Certain  Chinese  classifiers  can  be  used 
before  all  plural  nouns.  Some  classifiers  spec¬ 
ify  the  container  of  the  objects,  for  example, 
‘ — [ yi |  H  Y[lanzi]  YYlpingguo]’  (a  basket  of 
apples).  The  classifier  changes  when  the  con¬ 
tainer  changes.  These  can  be  treated  differently 
from  sortal  and  anaphoric  classifiers. 
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